Floating-Point Programmer’s Guide 

Addendum 


Introduction 


Sun Floating-Point Options 
cc Optimization Levels 


FORTRAN Optimization 
Levels 


NOTE 



The Floating-Point Programmer's Guide for the Sun Workstation, which 
describes floating-point programming issues under pre-4.0 versions of SunOS, 
will be completely rewritten. It now encompasses floating-point programming on 
both the Sun-4 and the Sun-3 with MC68881 or FPA. This errata summarizes 
key changes for SunOS 4.0, based upon an early version of the SunOS 4.0 imple¬ 
mentation; final results may vary as to speed or correctness. Most of what fol¬ 
lows also applies to the special SunOS Sys4-3.2 release. 

For more information about the Sun-4, see The SPARC Architecture Manual. 

This Supplement refers to the C compiler, cc, included as part of SunOS 4.0, 
and to the corresponding FORTRAN compiler, f 77, sold as a separate product 
FORTRAN 1.1. 


Four optimization levels are available for cc, namely -04, -03, -02, and -01, 
described in cc(l). -01 corresponds to -0 in SunOS 3.2. Generally, floating¬ 
point code may be compiled at -04 or -03 unless excessively long compilation 
time results; then -02 should provide satisfactory optimization. The default is 
unoptimized code generation. 

Three optimization levels are available for f 77, namely -03, -02, and -01. 

Generally, floating-point code may be compiled at -03 unless excessively long 
compilation time results; then -02 should provide satisfactory optimization. The 
default is unoptimized code generation. -03 and -02 correspond to -^0 and -P 
in SunOS 3.2. 

foptionfyi is better than -f switch 

foption(l) allows interactive or programmable determination of available 
floating-point hardware on Sun-3. It’s particularly intended for shell scripts that 
determine at run time which of several executable files to invoke based on avail¬ 
able hardware. For instance, such a shell script may select among multiple exe¬ 
cutables by first switching according to the result of arch(l), and then within 
one CPU architecture, switch according to the FPU architecture with f op- 
tion(l). This executable-level switching is the preferred alternative to the - 
f switch code-generation option, which imposes a substantial performance 
penalty on code intended to run on a Sun-3 FPA. 
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Sun-3 Multiple Libraries and On a Sun-3, the multiple directories 

Inline Expansion Templates , ^ 

^ /usr/lib/{ffpa,£68881,fswitch,fsoft} 


contain versions of libm. a and libm.il optimized for the indicated floating¬ 
point code-generation option. The correct version of libm. a is selected 
automatically when the compilation option is included on the link-step line: 


f — 

cc -££pa anyc.o -Im 
£77 -££pa any£.o 

-\ 

1 


_/ 


wiU both link automatically with /usr/lib/f fpa/libm. a. 

Inline expansion templates are not used automatically by the compilers; they 
must be explicitly specified by the programmer when needed. Inline expansion 
templates are especially recommended for use when compiling any FORTRAN 
program using complex or doublecomplex variables, and may provide significant 
performance increases in some other FORTRAN and C programs as well. 

The names of the Sun-supplied inline expansion template files have changed 
since SunOS 3.2. The correct template file is libm. il in the directory 
corresponding to the floating-point code generation option: 

/-\ 

cc -04 -ffpa anyc.c /usr/lib/ffpa/libm.il 
£77 -03 -£68881 any£.£ /usr/lib/£68881/libm.il 

V_^ 

The correct template file must be specified by the programmer. 

NOTE For maximum performance, link -Im prior to -TF11 on a Sun-3. 

FORTRAN 1.1 contains only one version of libF77 . a, compiled for default 
floating point code generation ( -f soft for Sun-3). Certain FORTRAN library 
routines are also contained in libm, optimized for specific code generation 
options. Thus in the following: 



£77 

-££pa 

any. o 

A 


£77 

-££pa 

any.o -Im 


V 




J 


the first link will search libraries in the order - IF77 -1177 -1U77 -Im 
-Ic, while the second will search in the order-Im -1F77 -1177 -1U77 
-Im -Ic. Any Fortran-required routines contained both in libm and libF7 7 
will be linked from the -f soft-compiled library /usr/lib/libF77 . a in 
the first case, and from the -ffpa-compiled library 
/usr/lib/f fpa/libm. a in the second case. 

Some early versions of the manual were incorrect about constant expression 
evaluation, cc and f 77 evaluate expressions involving only integer constants at 
compile time, cc also evaluates expressions involving floating-point constants 
at compile time, f 77, however, evaluates expressions involving floating-point 
constants at run time whenever possible; exceptions arise in cases like 


Constant Expression 
Evaluation 
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Suppressing Mixed-Code- 
Generation Warnings 


Sun-4 Considerations 

No -f. . . Code Generation 
Options 


Some SPARC Instructions 
Implemented in Software 



parameter(f=l.0+epsilon) 


which must be evaluated at compile time. 

Sun-3 compilers attempt to prevent accidentally mixing -f 68881 and -f fpa 
modules by means of a low-technology trick: such modules include an 
unsatisfied reference to f 68881_used and f fpa_used respectively. These 
entry points are only defined in /lib/Mcrtl. o and /lib/Wcrtl. o respec¬ 
tively; Id chooses one of these according to the -f. . . option it gets. Modules 
compiled with the wrong option cause the following error messages: 






Undefined: 



ffpa_used 




> 


or 


r 

Undefined: 



f68881_used 


V 


j 


This safety method no longer works in SunOS 4.0 when programs are finked 
dynamically (the default); the unsatisfied reference, which is never actually used, 
may remain unresolved indefinitely. Consequently the sequence 


c 


'N 


cc -c -ffpa any.c 



cc -f68881 any.o 


v_ 


> 


will link but will not execute correctly because the Sun-3 IT*A initialization code 
normally included in the fink step has been omitted. 

Static finking, the default in SunOS 3.2, is invoked with -Bstat ic in 4.0, and 
will detect such errors. Occasionally sophisticated users will have valid reasons 
to bypass such error checking. This may easily be done by including assembly- 
language modules that define the unsatisfied externals. Users doing so are 
responsible for correctly initializing any floating-point devices they do use. 

Release 4.0 is the first SunOS release to support both Sun-3 and Sun-4. 

Unlike Sun-3, there is only one Sun-4 floating-point architecture, SPARC, so no 
-f . . . option is needed to specify it. But -f single and -f single2 are 
available on Sun-4 just as on Sun-3 floating-point architectures. 

The SPARC architecture specifies certain instructions that are not implemented 
in the Sun-4/260 or 280 hardware. These instructions are therefore not generated 
by the Sun compilers. However the instructions are recognized by the Sun-4 
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SPARC Floating-Point 
Controller 


Don’t Forget to MAKEDEV 
fpa! 


assembler and implemented (slowly) by software in the Sun-4 kernel under 
SunOS 4.0 (but not Sys4-3.2), so that they may be invoked by assembly- 
language coding if desired. These instmctions include: 

□ fsqrt[sdx] 

□ all extended-precision instructions 

Other instructions were listed in early editions of the SPARC manual, but later 
deleted from the architecture, without ever being implemented. These include: 

□ fint[sdx] 

□ fintrz[sdx] 

□ fclass[sdx] 

□ fexpo[sdx] 

□ fscale[sdx] 
o frem[sdx] 

□ fquot[sdx] 

□ f[sdx]toir 

The SPARC CPU board contains two large Fujitsu gate arrays, the SPARC CPU 
and the floating-point controller (FPC) next to the Weitek 1164/1165. AU such 
prototype FPC’s in early Sun-4’s should have been replaced by Sun Customer 
Service with a "FAB-4" FPC. The suffix is at the end of the part number on the 
first line of the label on the FPC. If the FPC is examined and a prototype FPC 
found, notify Sun Customer Service. Prototype FPC numbers include: 

□ MB610303 MB610303A MB610303B MB86910 

□ FAB-4 FPC’s may be numbered 

□ MB610303C or MB86910A 


New I/O devices are not usable on a Sun until they have entries in /dev. In 
most cases these entries are not built into the distributed Sun software and must 
be done manually, once, by system administrators. 

The Sun-3 FPA (and the Sun-2 Sky FFP) are treated as I/O devices by vmunix 
and must have /dev entries made and then the system rebooted in order to get 
the microcode loaded. The /dev entries must be remade every time a major 
software installation occurs. The simple but often-overlooked procedure is to 
become root and then: 


r — 

# 

cd /dev 

A 


# 

MAKEDEV fpa 



# 

fastboot 


V 



. j 


wsun 

Xr miaosystems 
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FPA and NFS As a rule, computational servers and file servers don’t mix very well; servers 

should be allocated to one purpose or the other. In particular, Sun-3’s with a 
heavily-used FPA sometimes provide very poor NFS response. The VMEbus 
timeout setting of the FPA seems to have a significant effect; changing it from 5 
microseconds to 4 has helped several systems. If NFS response is a problem in a 
system with a heavily-used FPA, try the following; if it does not help, undo the 
change you made in case it was incorrect. 


On the Sun-3 FPA, change the settings on the dip switch near the VMEbus from: 


/— 

X=closed 

O=open 



1 2 3 4 5 

6 7 8 



0X0X0 

XXX 


v_ 



y 


- 

12345678 

XXOXOXXX 

s_- 


Floating-Point Numerics 


Floating-Point Types Despite extensive documentation available in the IEEE Floating-Point Standard, 

the MC68881 manual, and the SPARC definition, many questions arise about the 
details of IEEE floating-point format. For machine-independent coding, the fol¬ 
lowing suffices: 


IEEE Format 

Exponent 

Bits 

Significand 

Bits 

Equivalent 
Decimal Precision 

IEEE Single: 

C float 

Fortran REAL 

8 

24 

6-9 

IEEE Double: 

C double 

Fortran DOUBLEPRECISION 

11 

53 

15-17 

IEEE Double-extended: 

15 

64 

18-21 


IEEE Double-extended is the type of the MC68881’s floating-point data registers 
but is not directly available in Sun’s high-level programming languages. 
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New Include Files 


New Functions 


#sun 

Xr microsystems 


"Equivalent decimal precision" is defined as a range; the smaller number is the 
greatest number of significant decimal digits that is never more precise than the 
binary type; the larger number is the least number of significant decimal digits 
that is never less precise than the binary type. 


The ranges of floating-point types can be conveniently determined with a simple 
program like: 


f 



- 

real 

i:_min_subnormal, r_max_subnormal 


r min 

_no rma 1 , r_max_norma 1 


doubleprecision d min_subnormal,d max subnormal 


d_min_ 

normal, d max_normal 


print 

* 

f 

r min subnormal() 


print 

* 

f 

r max subnormal () 


print 

* 

f 

r_min_normal () 


print 

★ 

f 

r max normal() 


print 

★ 

f 

d min subnormal () 


print 

* 

f 

d max subnormal {) 


print 

★ 

9 

d_min_normal () 


print 

* 

f 

d_max_normal ( ) 


end 




1 _ 



J 


whose output is 


1.40129846E-45 

1.17549421E-38 

1.17549435E-38 

3.40282347E+38 

4.9406564584124654-324 

2.2250738585072009-308 

2.2250738585072014-308 

1.7976931348623157+308 


V_ 

_^ 


Roating-point definitions are now contained in three files. <sys/ieeefp.h> 
contains certain definitions of IEEE floating point required in the kernel. 

< floatingpoint. h> contains definitions required and functions imple¬ 
mented in libc. a, including the necessary definitions for correctly-rounded 
base conversion. <math. h> defines the functions implemented in each version 
of the expanded libm. a. <math. h> includes <f loatingpoint. h>, 
which in turn includes <sys/ieeefp.h>. See floatingpoint (3) and 
intro (3M). 

The mathematical function library, libm. a, has been substantially respecified 
and reimplemented. All (3M) man pages should be reviewed. Some of the 
changes affect atan2(0,0), pow(x,0), hypot (oo, x), and so on. 
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FORTRAN l±hmFunctions 


abrupt_underflow Mode 



All relevant libm functions are provided in double-precision and single- 
precision versions designed to be called from FORTRAN. Thus the C double and 
single-precision codes 


r - 

#include <math.h> 


double X, y; 
y = asinh(x) ; 


float X, y; 

FLOATFUNCTIONTYPE t; 
t = r_asinh_(&x) ; 

ASSIGNFLOAT(y, t); 

<_ 

_> 

correspond to the FORTRAN codes 

r 

doubleprecision x, y, d_asinhO 
y = d_asinh(x) 


real x, y, r_asinh() 
y = r_asinh(x) 

j 


as described in single_precision(3M), libm_single(3F), and 
1 i bm_do ub 1 e (3F). 

Unlike the asynchronous MC68881, synchronous high-performance floating¬ 
point chips, such as the Weitek 1164/1165, used in the Sun-3 FPA and in the 
Sun-4, are unable to efficiently handle subnormal operands or results in the 
manner intended by the IEEE floating-point standard. Consequently, software 
emulation is required to remedy the deficiencies of the hardware by "recomputa- 
tion" of the correct result from the operands. This is occasionally observed to 
cause numerical programs to suffer extremely poor performance, consuming a 
very large amount of system time relative to user time. Part of the system time is 
due to kernel overhead, and part due to the time required to do the recomputa¬ 
tion. 

There are three common cases in which such recomputation adversely affects 
performance. 

1) Underflowed results on multiplication or division, which are not directed to 
be trapped by ieee_handler(3M), are recomputed to determine the 
correct subnormal or zero result. 

2) Subnormal operands, usually the result of previous underflows, are recom¬ 
puted to determine the correct result. 

3) Exponentials of large negative arguments, such as double- 
precisionexp (x) for x < -709, are recomputed to determine the correct 
subnormal or zero result. 
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SunOS 4.0 provides a uniform way to obtain abrupt_underf low mode treat¬ 
ment of the 1164/1165; the C and FORTRAN calls 


r 

abrupt_underflow_() / 



call abrupt_underflow0 


V 


_/ 


enable abrupt_underf low mode treatment on Sun-3 FPA and Sun-4, which 
partially corresponds to the "FAST" hardware mode bit of the 1164/1165. 
abrupt_underf low has no effect with other Sun-3 floating-point options. Its 
effects in the three common cases are currently as follows: 

1) Underflowed results: abrupt_under f low does not affect this case in 
SunOS 4.0. 

2) Subnormal operands: On Sun-3 FPA and Sun-4, 
causesabrupt_under flowmode to be treated as zero without causing 
any exception. 

3) Exponentials of large negative arguments: On Sun-3 FPA and Sun-4, 
abrupt_underflow mode causes exponentials, that would normally underflow 
to subnormal or zero results, to return zero without causing any exception. 

In each case, the abr upt_under flow mode may only improve performance 
when the other modes affecting rounding direction and precision have their 
default values. 

abrupt_underf low mode does not conform to the IEEE Standard, and 
IEEE-based exception handling should not be relied upon when running in 
abrupt_under flowmode. To return to Standard-conforming behavior in a 
program, use the following calls: 


r — 

gradual_underflow() ; 



call gradual_underflow{) 


V_ 




NOTE Avoid the obsolete f pamode {). 

In general, the Weitek 1164/5 FAST mode is intended to bypass normal IEEE 
exception handling. Thus not only may the numerical results differ from normal 
IEEE results, but neither IEEE exception reporting, through ieee_f lags (), 
nor IEEE trapping, through ieee_handler () and SIGFPE, should be relied 
upon. The abrupt_underf low function call is thus named for FAST mode’s 
principal application, although FAST mode could affect other exceptions besides 
underflow, depending on the floating-point chips and their controllers in a partic¬ 
ular implementation. 

Porting an application from one computer to another always produces interesting 
side effects, particularly when floating point is involved. The most common 
porting situation involving Suns is converting applications written for DEC’S 
VAX hardware, VMS operating system, and extended VMS FORTRAN compiler. 


Porting Applications from 
non-IEEE Systems 


?;;icr^sten^ 


Rev A, August 1989 Part No: 800-1789-12 





Floating-Point Programmer’s Guide Addendum — Continued 


Trigonometric Argument 
Reduction Mode 



Sun FORTRAN 1.1 now supports most VMS FORTRAN extensions, so that use of 
non-standard FORTRAN is no longer a serious impediment. However, running 
programs developed under non-IEEE arithmetic for the first time on an IEEE sys¬ 
tem often reveals previously unsuspected properties of the programs. Often 
these have to do with exceptions, particularly underflow. VMS FORTRAN pro¬ 
grams typically expect that underflow will be treated by underflowing to zero 
without generating any exception, and that overflow and division by zero will 
generate a SIGFPE that is fatal unless a SIGFPE handler has been established. 
SIGFPE handlers often assume that the only cause of SIGFPE is an overflow or 
division by zero. Therefore: 

□ Programs that underflow frequently and therefore perform poorly may some¬ 
times benefit from abr up t_under flow () as described above. Changing 
from gradual to abrupt underflow may have an adverse effect on accuracy, 
however, previously abrupt underflow may have been degrading results 
silently on the non-IEEE system. 

□ Programs that depend on halting in the event of overflow or division by zero 
should use ieee_handler () to obtain such treatment since the IEEE 
Standard requires default non-stop exception handling. 

□ Programs that install their own SIGFPE handlers must be rewritten to work 
properly with the Sun-3 FPA, to recognize and treat appropriately the partic¬ 
ular SIGFPE code, FPE_FPA_ERROR, as described in the 3.2 floating¬ 
point manual. 

That floating-point exceptions are occurring in the normal mode might be 
inferred from very slow performance with much system time relative to user 
time, or from messages produced by ieee_retrospective (). 

Trigonometric functions for radian arguments outside the range [-7 c/4,ji:/ 4] are 
usually computed by "reducing" the argument to the indicated range by subtract¬ 
ing integral multiples of k/2. 

Since k is not a machine-representable number, it must be somehow approxi¬ 
mated; the errcr in the final computed trigonometric function depends on the 
rounding errors in argument reduction with an approximate k as well as the 
rounding and approximation errors in computing the trigonometric function of 
the reduced argument. Even for fairly small arguments, the relative error in the 
final result may be dominated by the argument reduction error, while even for 
fairly large arguments, the error due to argument reduction may be no worse than 
the other errors. See the March 1981 issue of the IEEE’s Computer magazine, 
page 71. 

There is a widespread misapprehension that trigonometric functions of aU large 
arguments are inherently inaccurate, and all small arguments relatively accurate, 
based on the simple observation that large enough machine-representable 
numbers are separated by a distance greater than n. However there is no inherent 
boundary at which computed trigonometric function values suddenly become 
bad, nor are the "inaccurate" function values useless. Provided that the argument 
reduction be done consistently, the fact that the argument reduction is performed 
with an approximation to k is practically undetectable, since all essential 
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identities and relationships are as well preserved for large arguments as small. 

There are several consistent ways to perform trigonometric argument reduction; 
SunOS 4.0 provides three. Perhaps most satisfying to mathematicians is to gen¬ 
erate an approximation to % of such precision that the roundoff due to argument 
reduction is never worse than any other roundoff in the final answer. That’s as 
good as having k to infinite precision. For IEEE double precision, this is 
equivalent to an approximation to tc of over one thousand bits of accuracy, so this 
method is slowest. 

Most satisfying to Sun-3 users is to perform argument reduction with the 66- 
significant-bit approximation to % used in the MC68881 hardware and mimicked 
in the Sun-3 FPA hardware. This is most efficient on Sun-3’s. 

Some users prefer that trigonometric argument reductions be performed with an 
approximation to tc representable as an ordinary floating-point variable; for those 
users, the nearest 53-significant-bit double-precision approximation to tc is avail¬ 
able, Can that approximation P; it has the property that computed sin (P) == 
0 just as the correct sin (tc) == 0; furthermore tan (P) == oo. 53 -bit 
reduction is the most efficient on Sun-4’s. 


The trigonometric argument reduction mode is selected by assigning to a global 
C variable fp_jDi. Its allowed values are described in <math. h>: 


f 

enum fp__pi_tYpe { 

fp_pi_infinite = 0, 

/* 

Infinite-precision approximation to pi. */ 

A 

fp_pi_66 = 1, 

/* 

66-bit approximation to pi. */ 


fp_pi_53 = 2} 

/* 

53-bit approximation to pi. */ 


extern enum fp_j5i_type fp_pi; 



_X 


f p_j5i is initialized to f p_pi_6 6 in order to produce the same results as for 
programs previously run on Sun-3’s. f p_pi is directly available to C programs; 
from FORTRAN it is necessary to call a short C program like this: 


r 

call set_pi_53() 

-- 

#include <math.h> 


void 

set_jpi_53_() 

{ 

fp_pi = fp_pi_53; 

} 

V- 

j 


The relative performance of infinite, 66-bit, and 53-bit argument reduction varies 
considerably depending on available hardware and on the magnitudes of the 
arguments. When most trigonometric arguments are typical ones of magnitude < 
10, then the choice of argument reduction constant usually has insignificant per¬ 
formance impact. 



microsystems 
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NOTE 

IEEE-Standard Conformance 

Correctly-Rounded Base 
Conversion 


ieee_flags 


ieee_handler 

Special IEEE Functions 



Using the f 68881/libm. il or f fpa/libm. il inline expansion template 
files will always cause 66-bit argument reduction to occur regardless of the set¬ 
ting of fpjpi. 


C and FORTRAN input and output of decimal floating-point numbers is now 
correctly rounded to or from the internal binary representation. For extreme 
exponents, "correctly rounded" is more exacting than the IEEE minimal require¬ 
ment. Correctly-rounded base conversion is obtained through the usual C Unc¬ 
tions strtod(3), scanf (3), and printf (3), and through the usual FORTRAN 
input and output. These functions now can read, as well as write, ASCII 
representations such as "inf," "infinity," and "NaN." 

Finer control, including access to rounding modes and exception flags, can be 
obtained by use of string_to_decimal(3), decimal_to_f loating(3), 
and floating_to_decimal(3). 

The number of significant digits in implicitly-fomiatted FORTRAN hst-directed 
output ()print*,x,y,z has been increased so that sufficient decimal digits are 
printed to uniquely specify the binary value held internally. 

ieee_f lags(3M) or f 77_ieee_environment(3F) provide a uniform way 
of accessing the IEEE modes for rounding direction and rounding precision, and 
the IEEE exception-occurred accrued status bits, for Sun-3 with -£68881 or- 
ffpa, and for Sun-4. Use ieee_f lags () instead of fpstatus_{) and 
fpmode_(). 

ieee_handler(3M) or f 77_ieee_environinent(3F) provide a uniform 
way of enabling IEEE traps on floating-point exceptions. Trapped exceptions 
result in SIGFPE signals. To conveniently exploit IEEE trapping, use 
ieee_handler {) rather than fpmode {), signal(3), or sigvec(2). 

libm now contains implementations of all functions needed for the IEEE Stan¬ 
dard, its Appendix, or the IEEE Test Vectors, replacing the special functions 
listed in Appendix D of the SunOS 3.2 version of the Floating-Point 
Programmer’s Guide manual. Use these (3M) functions or their d_ or r_ coun¬ 
terparts: 
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IEEE Standard: 

sqrt(3M) 

square root 

remainder(3M) 

remainder 

irint(3M) 

convert floating-point value to integral value in integer format 

rint(3M) 

convert floating-point value to integral value in floating-point format 

decimal_to_floating(3) 

convert decimal record to binary floating point 

floating_to_decimal(3) 

convert binary floating point to decimal record 

ieee_flags(3M) 

get/set modes/status 

ieee_handler(3M) 

enable/disable trapping 


IEEE Appendix: 

copysign(3M) 

copysign 

scalb(3M) 

scalb - scale by base b to power n in floating-point form 

scalbn(3M) 

scale by base b to power n in integer form 

logb(3M) 

logb - exponent in floating-point form 

ilogb(3M) 

exponent in integer form 

nextafter(3M) 

nextafter 

finite(3M) 

finite 

isnan(3M) 

isnan 

isnan(x)llisnan(y) 

unordered 

lp_class(3M) 

class 


IEEE Test Vectors: 

logb(3M) 

L test vector 

scalb(3M) 

S test vector 

significand(3M) 

F test vector 


Note that the <math. h> function class () and the <f loatingpoint. h> 
struct member decimal_record.class , defined in the Sys4-3.2 release and some 
preliminary versions of SunOS 4.0, present an irreconcilable conflict with the 
"class” reserved word in C++ and other languages. Accordingly class () 
d_class_ (), r_class_ (), and decimal_record. class are named 
fp_class 0 , d_fp_class_{), r_fp_class_(), and 
decimal_record. f pclass in the released version of SunOS 4.0. 

Suppressing ieee_retrospective is a libm function invoked whenever a FORTRAN 

ieee_retrospective program terminates normally or abnormally for any reason (not necessarily 

Warnings because of a floating-point exception). Under IEEE arithmetic, exception- 

occurred bits accrue until explicitly cleared by the programmer. If any IEEE 
exceptions, other than inexact, occur but are not cleared by the time the program 
terminates, a message listing aU the exceptions stiU uncleared is appended to 
standard error: 

f -- 

Warning: the following IEEE floating-point arithmetic exceptions 

occurred in this program and were never cleared: 

Inexact; Underflow; 


Asun 


microsystems 
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No message is printed if the only outstanding exception is inexact. Note that 
IEEE exceptions arise only on Sun-3 compiled with -£68881 or - f fpa, or on 
Sun-4. 

The significance of the warning may be difficult to determine. "Inexact" implies 
a normal rounding error, which is to be expected in floating-point programs, 
while "Underflow" and "Overflow" warn of rounding errors that may have been 
relatively larger than usual. Whether the underflowed or overflowed result 
affected the final answer can only be determined by analyzing the program. For 
instance, a small underflowed result may suffer significant relative rounding 
error, which matters if it is ultimately multiplied by a large number, but not if it 
is ultimately added to a large number. 

The best way to suppress the message is to determine which operations are gen¬ 
erating the exceptions, then altering the algorithm to avoid the exceptions. 
ieee_handler(3F) may be helpful in this search. 

A second method is to clear all the exceptions at the end of the program by cal¬ 
ling ieee_f lags(3F). The ieee_retrospective message won’t appear 
except possibly as part of an unplanned abnormal termination. 


Finally the message can be fully suppressed by defining a function in the FOR¬ 
TRAN source 


r — 

subroutine ieee_retrospective() 
end 


v 




C programs do not call ieee_retrospective automatically. To obtain its 
effect, insert a call manually in a C program: 


r — 

extern void ieee_retrospective (); 

-\ 

V_ 

ieee_retrospective_(); 

j 


While abrupt_under f low is enabled, exception handling need not conform 
to IEEE requirements. Thus in a program that enables abrupt_underf low, 
no message from ieee_retrospective does not imply that the results were 
not potentially corrupted by underflow. That possibility is presumed to have 
been eliminated by analysis prior to enabling abrupt_underf low. 

Benchmarks The following benchmarks indicate maximum performance for a Sun-3/280 com¬ 

piled -£68881 or-f £pa, and for a Sun-4/280. FORTRAN programs were 
compiled with -03; the C program spiceSbl -02; the Sun-supplied 
libm. il inline expansion templates were used. Benchmark sources are usually 
slightly modified for testing purposes; therefore comparisons to results obtained 
by others on non-Sun systems may be misleading. 

The 100x100 Unpack benchmark results in KFLOPS pertain to FORTRAN code 
with ROLLED BLA’s. 
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The 1000x1000 Linpack benchmaik results in KFLOPS pertain to the best per¬ 
formance obtained from Fortran codes that do "higher-level granularity" unrol¬ 
ling of 4, 8, or 16 times. 

The 100x100 Zlinpack benchmark results in complex-KFLOPS measure perfor¬ 
mance on a complex or doublecomplex arithmetic translation of the Linpack 
benchmark. A complex-KFLOP is equivalent to 4 real KFLOPS. 

The double-precision Doduc benchmark results in elapsed SECONDS of real 
time measure performance on the non-linear nuclear reactor simulation bench¬ 
mark distributed and reported by N. Doduc (uunet. uu. net! mcvax 
! inria ! f tc ! ndoduc). 

The double precision FORTRAN and C versions of SPICE, 2G6, and 3B1 
respectively, measure elapsed SECONDS of real time for three representative 
decks: EDGEREG, a Schottky edge-triggered register, COMPARATOR, a differen¬ 
tial comparator, and DIGSR, a CMOS digital shift register. 


BENCHMARK RESULTS 
KFLOPS 

3/280 

-f68881 

3/280 

-ffpa 

4/280 

100 Linpack double 

115 

470 

1070 

100 Linpack single 

125 

900 

1600 

1000 Linpack double 

160 

600 

1140 

1000 Linpack single 

180 

1050 

1700 

100 Zlinpack complex 

28 

110 

150 

100 Zlinpack doublecomplex 

26 

80 

170 


BENCHMARK RESULTS 
ELAPSED SECONDS 

3/280 

-f68881 

3/280 

-ffpa 

4/280 

Doduc double 

2000 

880 

530 

spice2g6 EDGEREG 

220 

120 

61 

Spice2g6 COMPARATOR 

260 

130 

68 

spice2g6 DIGSR 

730 

380 

220 

spice3bl EDGEREG 

210 

100 

61 

spice3bl COMPARATOR 

260 

125 

72 

spice3bl DIGSR 

810 

390 

270 


The Sun-4 complex Zlinpack results are affected by the choice of inline expan¬ 
sion templates for complex multiplication. Double-precision products are used 
to maximize robustness against rounding errors and intermediate overflows and 
underflows, just as on the Sun-3 with 


r 

A 

/usr/lib/f{68881,fpa}/libm. il 


V 

J 


^sun 
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When such robustness is not needed, performance can be significantly improved 
by substituting a user-coded single-precision complex multiplication template. 

The Linpack benchmarks incorporate a slight modification in the distributed 
EPS LON routine. Without the modification the normalized residual is not com¬ 
puted correctly when compiled with -f 688 81 -03 without -f store. The 
modifications to EPS LON don’t affect the benchmark itself or the computation of 
the actual residual, only the scaling factor applied to compute the normalized 
residual. The modifications are necessary as a result of improvements in the glo¬ 
bal optimizer, which is now able to allocate more variables to registers. The fol¬ 
lowing extract reveals the modification in lower case: 


f 

REAL FUNCTION EPSLON (X) 

REAL X 



REAL A,B,C,EPS 


c 

THIS PROGRAM SHOULD FUNCTION PROPERLY ON ALL SYSTEMS 


c 

SATISFYING THE FOLLOWING TWO ASSUMPTIONS, 


c 

1. THE BASE USED IN REPRESENTING FLOATING POINT 


c 

NUMBERS IS NOT A POWER OF THREE. 


c 

2. THE QUANTITY A IN STATEMENT 10 IS REPRESENTED TO 


c 

THE ACCURACY USED IN FLOATING POINT VARIABLES 


c 

THAT ARE STORED IN MEMORY. 


c 

THE STATEMENT NUMBER 10 AND THE GO TO 10 ARE INTENDED TO 


c 

FORCE OPTIMIZING COMPILERS TO GENERATE CODE SATISFYING 


c 

ASSUMPTION 2. 



A = TOREAL(4)/TOREAL(3) 



call dummy(a) 


10 

B = A - ONE 

C = B + B + B 

EPS = ABS(C-ONE) 

IF (EPS .EQ. ZERO) GO TO 10 

EPSLON = EPS*ABS(X) 

RETURN 

END 



subroutine dummy(a) 

REAL a 

end 


V_ 


y 


The -f store option may adversely affect performance and is usually useful 
only in code very much like EPS LON which attempts to determine properties of 
machine arithmetic. 
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MC68881 Mask Differences 
New MC6888X varieties 


Suppressing A79J Warnings 


NOTE 



me 68 8 81ver sion(8) has been expanded to differentiate B96M MC68882’s 
and B81G MC6888rs as well as the A79J and A93N MC6888rs previously dis¬ 
tinguished. Future Sun products may incorporate B81G MC6888rs and B96M 
68882’s, so it is worthwhile reviewing the differences. Note that chips which 
mc68 881version classifies as "A93N'' or "B81G'’ may be physically marked 
as a different, functionally equivalent, mask set. 

□ A79J MC68881’s are the original chips shipped in early Sun-3’s. They have 
not been shipped by Sun since mid-1986. Tlieir bugs are listed in an appen¬ 
dix to the SunOS 3.2 version of the Floating-Point Programmer’s Guide 
manual. 

n A93N MC68881’s are the standard chips in most Sun-3’s. They have one 

known bug: extended precision addition and subtraction occasionally round 
the wrong way when the correct result is very nearly half-way between two 
representable extended-precision numbers. This bug is almost undetectable 
on Sun systems since extended precision data types are not directly available 
in compiled languages. The bug may be observed by running IEEE test vec¬ 
tors directly on extended-precision operands and results through assembly- 
language coding; it also shows up in the computed residual of the Linpack 
benchmark program and may affect any other computed result which is 
essentially roundoff noise. Since most application programs avoid printing 
out such results, they aren’t much affected by the bug. 

□ B81G MC6888rs are identical to A93N’s except that the bug has been 
fixed. As of the end of 1987, no Sun products used B81G’s. 

□ B96M MC68882’s are functionally identical to B81G MC68881’s in user 
mode. Because they implement some parallel execution, their internal state 
is more complex than MC68881’s, so the size of the information dumped by 
the privileged FSAVE instruction is greater. Consequently MC68882’s can’t 
be used on Sun operating systems prior to 4.0. As of the end of 1987, no 
Sun products used MC68882’s. 

In order to improve performance of the correctly-functioning later masks, SunOS 
4.0 no longer attempts to work aroimd the shortcomings of the early A79J 
MC68881’s. The 3.2 manual describes some of these shortcomings, and men¬ 
tions that some have no software workaround anyway. 

SunOS 4.0 will print a warning message on standard error if a program compiled 
-f68881 is executed on a Sun-3 with an A79J MC68881: 

Please note: MC68881 upgrade to A93N mask may be advisable. See Floating- 
Point Programmer’s Guide. 

A program that does not require extensive floating point can bypass the message 
by recompiling with - f sof t. 

If it can be determined that none of the A79J shortcomings affects a particular 
program, then the message can be suppressed by including a C function 
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System V Interface 
Compliance 



/ - 

int ma93n_() 

{ 

return 1; 

} 


•s_ 

__ J 

or a FORTRAN function 

integer function ma93n() 
ma93n=l 

end 

>1 

_ 

_ ) 


or by modifying the executable image with adb(l). 


Unlike SunOS 3.2, SVID compliance on SunOS 4.0 is the same for all Sun-3 
floating-point code generation options and for Sun-4. Compliance is not claimed 
for certain aspects of SVID that are contrary to the intent of the IEEE Standard. 
See matherr(3M). 

The libm. il inline expansion templates do not call matherr(3M) or set 
errno and therefore should not be used if detailed SVID compliance is 
required. 


Rev A, August 1989 Part No: 800-1789-12 






